GitHub Repository: debakarr/machinelearning
Path: blob/master/Part 9 - Dimension Reduction/Principal Component Analysis/[R] Principal Component Analysis.ipynb
¹³⁴¹ views

Kernel: R

Principal Component Analysis

Data preprocessing

In [1]:

# Import the dataset
dataset = read.csv('Wine.csv')

In [2]:

head(dataset, 10)

Out[2]:

In [3]:

# Splitting the dataset into the Training set and Test set
library(caTools)
set.seed(42)
split = sample.split(dataset$Customer_Segment, SplitRatio = 0.8)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)

In [4]:

head(training_set, 10)

Out[4]:

In [5]:

head(test_set, 10)

Out[5]:

In [6]:

# Feature Scaling
training_set[-14] = scale(training_set[-14])
test_set[-14] = scale(test_set[-14])

In [7]:

head(training_set, 10)

Out[7]:

In [8]:

head(test_set, 10)

Out[8]:

Applying Principal Component Analysis

In [9]:

library(caret)
library(e1071)

Out[9]:

Loading required package: lattice
Loading required package: ggplot2

In [10]:

pca  = preProcess(x = training_set[-14], 
                  method = 'pca', 
                  pcaComp = 2)

training_set = predict(pca, training_set)

In [11]:

head(training_set, 10)

Out[11]:

In [12]:

training_set = training_set[c(2, 3, 1)] # Reordering the columns

In [13]:

head(training_set, 10)

Out[13]:

In [14]:

test_set = predict(pca, test_set)
test_set = test_set[c(2, 3, 1)] # Reordering the columns

In [15]:

head(test_set, 10)

Out[15]:

Fitting classifier to the Training set

In [16]:

classifier = svm(formula = Customer_Segment ~ ., 
                 data = training_set, 
                 type = 'C-classification', 
                 kernel = 'radial')

Predicting the Test set results

In [17]:

y_pred = predict(classifier, newdata = test_set[-3])

In [18]:

head(y_pred, 10)

Out[18]:

In [19]:

head(test_set[3], 10)

Out[19]:

Making the Confusion Matrix

In [20]:

cm = table(test_set[, 3], y_pred)

In [21]:

cm

Out[21]:

classifier made 11 + 14 + 10 = 35 correct prediction and 1 incoreect prediction.

Visualizing the Training set results

In [22]:

# install.packages('ElemStatLearn')
library(ElemStatLearn)

In [23]:

set = training_set

In [24]:

X1 = seq(min(set[, 1]) - 1, max(set[, 1]) + 1, by = 0.01)
X2 = seq(min(set[, 2]) - 1, max(set[, 2]) + 1, by = 0.01)
grid_set = expand.grid(X1, X2)
colnames(grid_set) = c('PC1', 'PC2')
y_grid = predict(classifier, newdata = grid_set)
plot(set[, -3],
     main = 'Kernel SVM (Training set)',
     xlab = '1st Principal Component', ylab = '2nd Principal Component',
     xlim = range(X1), ylim = range(X2))
contour(X1, X2, matrix(as.numeric(y_grid), length(X1), length(X2)), add = TRUE)
points(grid_set, pch = '.', col = ifelse(y_grid == 2, 'lightblue', ifelse(y_grid == 1, 'springgreen3', 'tomato')))
points(set, pch = 21, bg = ifelse(set[, 3] == 2, 'blue3', ifelse(set[, 3] == 1, 'green4', 'red3')), col='white')
legend("topright", legend = c("0", "1", "2"), pch = 16, col = c('red3', 'green4', 'blue3'))

Out[24]:

Visualizing the Test set results

In [25]:

set = test_set

In [26]:

X1 = seq(min(set[, 1]) - 1, max(set[, 1]) + 1, by = 0.01)
X2 = seq(min(set[, 2]) - 1, max(set[, 2]) + 1, by = 0.01)
grid_set = expand.grid(X1, X2)
colnames(grid_set) = c('PC1', 'PC2')
y_grid = predict(classifier, newdata = grid_set)
plot(set[, -3],
     main = 'Kernel SVM (Test set)',
     xlab = '1st Principal Component', ylab = '2nd Principal Component',
     xlim = range(X1), ylim = range(X2))
contour(X1, X2, matrix(as.numeric(y_grid), length(X1), length(X2)), add = TRUE)
points(grid_set, pch = '.', col = ifelse(y_grid == 2, 'lightblue', ifelse(y_grid == 1, 'springgreen3', 'tomato')))
points(set, pch = 21, bg = ifelse(set[, 3] == 2, 'blue3', ifelse(set[, 3] == 1, 'green4', 'red3')), col='white')
legend("topright", legend = c("0", "1", "2"), pch = 16, col = c('red3', 'green4', 'blue3'))

Out[26]:

Principal Component Analysis

Data preprocessing

Applying Principal Component Analysis

Fitting classifier to the Training set

Predicting the Test set results

Making the Confusion Matrix

Visualizing the Training set results

Visualizing the Test set results

Product

Resources

Company